Hindi Handwriting Recognition

Here, we apply CNNs to classify Hindi Handwritten Characters and Digits !

A Multi-Class Balanced Image Classification

Glimpse into our Data

Data Preparation

Train-Validation-Test Split

Original Data	Total	Train	Validation	Test
character_1	2000	1400	300	300
character_2	2000	1400	300	300
…	…	…	…	…
digit_9	2000	1400	300	300
Total	92,000	63,000	13,500	13,500

There are 36 characters and 9 digits resulting in 45 classes. This is a Balanced Multi-class Classification Problem. Based on the above split, we can use batch_size = 250. This results in steps_per_epoch = 252 for training and 54 for validation and testing.

Setting Directory

Code

original_dataset_dir = "C:/Users/KUNAL/Downloads/#R coding/#Books/covered/#Book - Manning - Deep Learning with R and Keras/## Article"

base_dir = "C:/Users/KUNAL/Downloads/#R coding/#Books/covered/#Book - Manning - Deep Learning with R and Keras/## Article/HindiCharacter"

train_dir = file.path(base_dir,"Training")
validation_dir = file.path(base_dir,"Validation")
testing_dir = file.path(base_dir,"Test")

Network Structure

Code

# Defining Hyper parameter
img_height = 32
img_width = 32
batch_size = 250
num_classes = 45 

library(keras)

datagen = image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(train_dir,
                    datagen,target_size = c(img_height, img_width),
                    batch_size = batch_size,
                    class_mode = "categorical", 
                    color_mode = "grayscale")
# 63000 images belonging to 45 classes
val_generator   <- flow_images_from_directory(validation_dir,datagen,
                    target_size = c(img_height, img_width),
                    batch_size = batch_size,
                    class_mode = "categorical", 
                    color_mode = "grayscale")
# 13500 images belonging to 45 classes
test_generator  <- flow_images_from_directory(testing_dir,datagen,
                    target_size = c(img_height, img_width),
                    batch_size = batch_size,
                    class_mode = "categorical",
                    shuffle = F, 
                    color_mode = "grayscale")
# 13500 images belonging to 45 classes

# Model Structure
model <- keras_model_sequential() %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu", 
                input_shape = c(img_height, img_width, 1)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2),strides = c(2,2),padding = "same") %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2),strides = c(2,2),padding = "same") %>%
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2),strides = c(2,2),padding = "same") %>%
  layer_flatten() %>%
  layer_dropout(rate = 0.5) %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = num_classes, activation = "softmax")

summary(model)

Compile, Train & Evaluate

Code

# Compile
model %>% compile(loss="categorical_crossentropy",
                  optimizer=optimizer_adam(learning_rate = 0.001), 
                  metrics=c("acc"))
# Callback
filepath = file.path(original_dataset_dir,"Hindi_final_model.h5")

check = callback_model_checkpoint(filepath,monitor = 'val_acc',
                                  verbose = 1,
                                  save_best_only = T,
                                  mode = "max")

early  = callback_early_stopping(monitor = "val_loss",
                                 mode = "min",
                                 patience = 3)

lr_red = callback_reduce_lr_on_plateau(monitor = "val_loss",
                                       patience = 2,
                                       verbose = 1, 
                                       factor = 0.3,
                                       min_lr = 0.000001)

callback_list = list(early,lr_red,check)

# Train
history <- model %>% fit_generator(train_generator,
                                   steps_per_epoch = 252,
                                   epochs = 40,
                                   callbacks = callback_list,
                                   validation_data = val_generator,
                                   validation_steps = 54)

Callbacks used -

Model Checkpoint - will save the model when validation accuracy is improving.
Early Stopping - will stop the model training if there is no reduction of validation loss over next 3 epochs.
Reduce LR on Plateau - will reduce learning rate by a factor of 0.3 when there is no improvement in validation loss over next 2 epochs.

Code

# Evaluate
model %>% evaluate(test_generator, steps = 54)
# 54/54 [==============================] - 7s
# 138ms/step - loss: 0.0737 - acc: 0.9784

Visualising our Network

***Figure : Character 4 (left) and Digit 6 (right)***

activations_2_max_pooling2d_19 - Character_4

activations_2_max_pooling2d_19 - Digit_6

activations_4_max_pooling2d_18 - Character_4

activations_4_max_pooling2d_18 - Digit_6

activations_6_max_pooling2d_17 - Character_4

activations_6_max_pooling2d_17 - Digit_6

As we can see that the first layer seems to act as an ‘edge-detector’ picking up the structure of characters and digits. As we move towards the higher layers, the representations start becoming more and more abstract. We also see that there are spaces where there were no activations at all - indicating the absence of certain filters.

Structural Experimenting

Sl.	Structures	Training	Validation	Testing	Remarks
1	(32,64 pool: 2,5 stride: 2,5) without dropout layer	97.66%	91.24%	95.49%	Overfitting
2	(32,64 pool: 2,5 stride: 2,5) with 50% dropout	91.18%	93.48%	96.41%	epochs = 10/40, lr = 0.0001
3	(32,64 pool: 2,5 stride: 2,5) with reduced learning rate	94.36%	94.87%	97.39%	epochs = 12/40, lr = 0.00009
4	(32,32,64 pool: 2,2,5 stride: 2,2,5) with 50% dropout	93.25%	94.62%	97.2%	epochs = 25/40, lr = 0.0003
5	(32,32,64 pool: 2,2,2 stride: 2,2,2) with 50% dropout	96.28%	94.33%	97.59%	epochs = 19/40, lr = 0.000027
6	(32,64 pool: 2,2,2 stride: 2,2,2) with 50% dropout	97.70%	93.19%	96.86%	epochs = 16/40, lr = 0.00009
7	32,32,64 - kernel 3,3,3 - pool 2,2,2 - stride 2,2,2 - classifier 64 - 50% dropout ***	96.64%	95.07%	97.84%	epochs = 22/40, lr = 0.000027
8	32,32,64 - kernel 5,5,5 - pool 2,2,5 - stride 2,2,5 - classifier 64 - 50% dropout	88.31%	92.44%	95.68%	epochs = 17/40, lr = 0.00009

We see that Model 7 with 3 convolution layers with filters 32,32,64 kernel 3x3, pool size 2x2 with strides of 2x2 for all layers and 50% dropout layer followed by a dense classifier with 64 neurons is the best performing model with Test Accuracy = 97.84%.

7th Model achieves a 97.84 % Test Accuracy !